Using K-Means clustering forimage segmentation:
1) Bring your own data: find three images you’d like to workwith. \ 2) Exp 1: Change k and try out how segmentation results vary. \ 3) Exp 2: Add spatial information in clustering and segmentation. \ 4) Display the results by replacing originalpixel values withpixelvalues of centroids. \ 5) Write a brief report with segmentation results. In your report, show both raw images and various output images with different k valuesand provide a summary of what you’ve learned bytrying the algorithm in different ways.
import pandas as pd
import numpy as np
import cv2
import matplotlib as mpl
import matplotlib.pyplot as plt
from tqdm import tqdm
import os
import glob
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = [3, 5, 7]
attempts = 10
figure_size = 20
image_files = glob.glob(os.getcwd() + "/*.jpeg")
image_files
['/Users/mcmahonmc/Github/machine-learning-2021/01_kmeans/IMG_1760.jpeg', '/Users/mcmahonmc/Github/machine-learning-2021/01_kmeans/IMG_9548.jpeg', '/Users/mcmahonmc/Github/machine-learning-2021/01_kmeans/IMG_2085.jpeg']
for image in image_files:
print(image)
img0 = cv2.imread(image)
img0 = cv2.cvtColor(img0, cv2.COLOR_BGR2RGB)
# convert to float32 (
img = img0.astype(np.float32)
#convert the MxNx3 image to a Kx3 image where k = MxN
vectorized = img.reshape((-1,3))
for k in K:
ret,label,center=cv2.kmeans(vectorized,k,None,criteria,attempts,cv2.KMEANS_PP_CENTERS)
center = np.uint8(center)
res = center[label.flatten()]
result_image = res.reshape((img.shape))
cv2.imwrite(image.split('/IMG')[0] + '/results/' + image.split('kmeans/')[1].split('.jpeg')[0] + '_k-%s.jpeg' % k, result_image)
fig, axs = plt.subplots(1,2, figsize=(20, 20))
axs[0].imshow(img0)
axs[0].set_title('Original Image', fontsize = 26)
axs[1].imshow(result_image)
axs[1].set_title('Result Image, k=%s' % int(k), fontsize = 26)
cmap = mpl.colors.ListedColormap(center/255)
bounds = np.linspace(0, len(center), len(center), dtype=int)
norm = mpl.colors.BoundaryNorm(bounds, cmap.N)
cb_ax = fig.add_axes([0.95, 0.35, 0.02, 0.3])
cbar = fig.colorbar(mpl.cm.ScalarMappable(cmap=cmap), cax=cb_ax, ticks=bounds, orientation='vertical', shrink=0.2)
for ax in axs:
ax.set_xticks([])
ax.set_yticks([])
plt.show()
/Users/mcmahonmc/Github/machine-learning-2021/01_kmeans/IMG_1760.jpeg
/Users/mcmahonmc/Github/machine-learning-2021/01_kmeans/IMG_9548.jpeg
/Users/mcmahonmc/Github/machine-learning-2021/01_kmeans/IMG_2085.jpeg
res
array([[213, 210, 206],
[213, 210, 206],
[213, 210, 206],
...,
[ 74, 68, 61],
[ 74, 68, 61],
[ 74, 68, 61]], dtype=uint8)
center
array([[242, 219, 191],
[ 74, 68, 61],
[147, 133, 116],
[213, 210, 206],
[187, 174, 154],
[ 44, 36, 28],
[ 97, 93, 89]], dtype=uint8)
label
array([[3],
[3],
[3],
...,
[1],
[1],
[1]], dtype=int32)
img0[0, 0, :]
array([187, 202, 205], dtype=uint8)
vectorized[0]
array([187., 202., 205.], dtype=float32)
img0[0, 1, :]
array([188, 201, 207], dtype=uint8)
vectorized[1]
array([188., 201., 207.], dtype=float32)
len(img0[:,0,0])
4032
len(img0[0,:,0])
3024
vectorized.shape
(12192768, 3)
from sklearn.preprocessing import minmax_scale
for image in image_files:
print(image)
img0 = cv2.imread(image)
img0 = cv2.cvtColor(img0, cv2.COLOR_BGR2RGB)
# create xy
x_ = np.zeros(len(img0[0,:,0]))
x = []
for i in range(0, len(img0[:,0,0])):
x = np.append(x, x_)
x_ = x_+1
y_ = np.arange(0, len(img0[0,:,0]))
y = np.tile(y_, len(img0[:,0,0]))
xy = np.vstack((x, y)).transpose().astype(int)
#convert the MxNx3 image to a Kx3 image where k = MxN
vectorized = img0.reshape((-1,3))
#add spatial coordinates
vec_xy = np.concatenate((vectorized, xy), axis=1)
xy_scaled = minmax_scale(xy, feature_range=(0,255))
vec_xy = np.concatenate((vectorized, xy_scaled), axis=1).astype(np.float32)
for k in K:
ret,label,center=cv2.kmeans(vec_xy,k,None,criteria,attempts,cv2.KMEANS_PP_CENTERS)
center = np.uint8(center)
res2 = center[label.flatten()]
result_image2 = res2[:,0:3].reshape((img0.shape))
cv2.imwrite(image.split('/IMG')[0] + '/results/' + image.split('kmeans/')[1].split('.jpeg')[0] + '_xy_k-%s.jpeg' % k, result_image2)
fig, axs = plt.subplots(1,2, figsize=(20, 20))
axs[0].imshow(img0)
axs[0].set_title('Original Image', fontsize = 26)
axs[1].imshow(result_image2)
axs[1].set_title('Result Image, k=%s' % int(k), fontsize = 26)
cmap = mpl.colors.ListedColormap(center[:,0:3]/255)
bounds = np.linspace(0, len(center[:,0:3]), len(center[:,0:3]), dtype=int)
norm = mpl.colors.BoundaryNorm(bounds, cmap.N)
cb_ax = fig.add_axes([0.95, 0.35, 0.02, 0.3])
cbar = fig.colorbar(mpl.cm.ScalarMappable(cmap=cmap), cax=cb_ax, ticks=bounds, orientation='vertical', shrink=0.2)
for ax in axs:
ax.set_xticks([])
ax.set_yticks([])
plt.show()
/Users/mcmahonmc/Github/machine-learning-2021/01_kmeans/IMG_1760.jpeg
/Users/mcmahonmc/Github/machine-learning-2021/01_kmeans/IMG_9548.jpeg
/Users/mcmahonmc/Github/machine-learning-2021/01_kmeans/IMG_2085.jpeg
res2
array([[209, 211, 205, 50, 56],
[209, 211, 205, 50, 56],
[209, 211, 205, 50, 56],
...,
[ 84, 79, 76, 209, 180],
[ 84, 79, 76, 209, 180],
[ 84, 79, 76, 209, 180]], dtype=uint8)
res2.shape
(12192768, 5)
Note Because there are a lot of images that are somewhat large, I am showing the images above and decided not to copy them again down here as I go through the results. Please let me know if that's a burden and I can change it for the next time.
Report
In this exercise, I applied kmeans clustering to perform image segmentation on three pictures taken from a trip to Big Bend National Park and Marfa. I found that this method was easily adapated to each new example and could create clusters that were not limited to a predefined geometric form (circle, triangle, etc). In each image, pixels were assigned clusters based on (Exp 1) their rgb color profile and (Exp 2) their rgb data and xy coordinate.
One caveat to this approach is that it relies on randomly selected starting positions for the centroids, and because of this, the clustering results varied across implementation attempts, meaning that sometimes elements of the images were optimally segmented and sometimes they were not (e.g., on image 1, the cacti in the foreground being colored green instead of brown like the mountains). Increasing k sometimes allowed for better segregation of unique elements (e.g., the sky became apparent in image 2, the parking lot and sky became apparent in image 3), but not always. For instance, image 1 had a relatively consistent color palette and clustering result across values of k. While this method was able to detect clusters of varying sizes and density, in this example they were not visually well separated, meaning that the result overemphasized shades of brown. This became more apparent with greater values of k. However, it is possible that if the starting centroid had been chosen closer to the cactus, a higher value of k would have given a greater probability of detecting this element in the resulting segmentation. This comes at the expense of overfitting to what might be considered non-unique clusters, resulting in the observed color palette which reflects many shades of brown. This method may perform better when there is more color contrast in the image rather than images which are nuanced in their color profiles.
One possibility for overcoming this that we explored was including spatial information (x, y coordinates; Exp 2). It was necessary to scale the coordinates before they were input into the k-means clustering algorithm to avoid overprioritizing the spatial features at the expense of the color features. The inclusion of the spatial coordinates resulted better separation in image 1, where we can now see the green cacti identified as a unique cluster in image 1. This segregation appeared to improve with higher k. However, the inclusion of spatial features resulted in overfitting across all images and all values of k, with more overfitting occurring at higher values of k. Here, the lower contrast in color values across pixels in image 1 was better suited to this method. On the other hand, the images with more variety in their color profiles (images 2 and 3) resulted in ill-defined clusters compared to their resulting images from Exp 1.
From these results, we can see that the optimal value of k and the decision about whether to also include spatial information alongside color information as input to the clustering algorithm depends on image - there is no absolute best approach. Choosing k manually for each image is difficult, and because the results change from multiple trials, one value for k may perform well on one iteration but not the next (this is where setting a seed would be very important). Increasing k and/or including spatial features can improve segmentation results, but also risks overfitting data resulting clusters that are not visually distinct.